{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Copula - Multivariate joint distribution"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "import seaborn as sns\n",
    "from scipy import stats\n",
    "\n",
    "sns.set_style(\"darkgrid\")\n",
    "sns.mpl.rc(\"figure\", figsize=(8, 8))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%javascript\n",
    "IPython.OutputArea.prototype._should_scroll = function(lines) {\n",
    "    return false;\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When modeling a system, there are often cases where multiple parameters are involved. Each of these parameters could be described with a given Probability Density Function (PDF). If would like to be able to generate a new set of parameter values, we need to be able to sample from these distributions-also called marginals. There are mainly two cases: *(i)* PDFs are independent; *(ii)* there is a dependency. One way to model the dependency it to use a **copula**."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Sampling from a copula\n",
    "\n",
    "Let's use a bi-variate example and assume first that we have a prior and know how to model the dependence between our 2 variables.\n",
    "\n",
    "In this case, we are using the Gumbel copula and fix its hyperparameter `theta=2`. We can visualize it's 2-dimensional PDF."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "from statsmodels.distributions.copula.api import (\n",
    "    CopulaDistribution, GumbelCopula, IndependenceCopula)\n",
    "\n",
    "copula = GumbelCopula(theta=2)\n",
    "_ = copula.plot_pdf()  # returns a matplotlib figure"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And we can sample the PDF."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "sample = copula.rvs(10000)\n",
    "h = sns.jointplot(x=sample[:, 0], y=sample[:, 1], kind=\"hex\")\n",
    "_ = h.set_axis_labels(\"X1\", \"X2\", fontsize=16)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's come back to our 2 variables for a second. In this case we consider them to be gamma and normally distributed. If they would be independent from each other, we could sample from each PDF individually. Here we use a convenient class to do the same operation.\n",
    "\n",
    "### Reproducibility\n",
    "\n",
    "Generating reproducible random values from copulas required explicitly setting the `seed` argument.\n",
    "`seed` accepts either an initialized NumPy `Generator` or `RandomState`, or any argument acceptable\n",
    "to `np.random.default_rng`, e.g., an integer or a sequence of integers. This example uses an\n",
    "integer.\n",
    "\n",
    "The singleton `RandomState` that is directly exposed in the `np.random` distributions is\n",
    "not used, and setting `np.random.seed` has no effect on the values generated."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "marginals = [stats.gamma(2), stats.norm]\n",
    "joint_dist = CopulaDistribution(copula=IndependenceCopula(), marginals=marginals)\n",
    "sample = joint_dist.rvs(512, random_state=20210801)\n",
    "h = sns.jointplot(x=sample[:, 0], y=sample[:, 1], kind=\"scatter\")\n",
    "_ = h.set_axis_labels(\"X1\", \"X2\", fontsize=16)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, above we have expressed the dependency between our variables using a copula, we can use this copula to sample a new set of observation with the same convenient class."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "joint_dist = CopulaDistribution(copula, marginals)\n",
    "# Use an initialized Generator object\n",
    "rng = np.random.default_rng([2, 0, 2, 1, 0, 8, 0, 1])\n",
    "sample = joint_dist.rvs(512, random_state=rng)\n",
    "h = sns.jointplot(x=sample[:, 0], y=sample[:, 1], kind=\"scatter\")\n",
    "_ = h.set_axis_labels(\"X1\", \"X2\", fontsize=16)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There are two things to note here. *(i)* as in the independent case, the marginals are correctly showing a gamma and normal distribution; *(ii)* the dependence is visible between the two variables."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Estimating copula parameters\n",
    "\n",
    "Now, imagine we already have experimental data and we know that there is a dependency that can be expressed using a Gumbel copula. But we don't know what is the hyperparameter value for our copula. In this case, we can estimate the value.\n",
    "\n",
    "We are going to use the sample we just generated as we already know the value of the hyperparameter we should get: `theta=2`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "copula = GumbelCopula()\n",
    "theta = copula.fit_corr_param(sample)\n",
    "print(theta)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can see that the estimated hyperparameter value is close to the value set previously."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
